-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dataset integration tests - Tables, Folders #1391
Conversation
…+ aws clients for dataset
…egration-tests-datasets-pt2 # Conflicts: # tests_new/integration_tests/core/environment/global_conftest.py # tests_new/integration_tests/modules/s3_datasets/global_conftest.py # tests_new/integration_tests/modules/s3_datasets/test_s3_dataset.py
tests_new/integration_tests/modules/s3_datasets/global_conftest.py
Outdated
Show resolved
Hide resolved
tests_new/integration_tests/modules/s3_datasets/global_conftest.py
Outdated
Show resolved
Hide resolved
…x issue with retry decorator
dataset_name, client, group, env, bucket=None, kms_alias=None, glue_database=None | ||
): | ||
dataset_name = 'persistent_s3_dataset1' | ||
def get_or_create_persistent_s3_dataset(dataset_name, client, group, env, bucket=None, kms_alias='', glue_database=''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset_name
is further used only for bucket naming and tags. Let's make it a dataset name as well. Otherwise all datasets has the same name TestDatasetCreated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
93b0997
to
6e9fbb1
Compare
6e9fbb1
to
5358677
Compare
Tested with latest changes locally and in a real CICD in AWS |
7e51219
to
4084a3b
Compare
4084a3b
to
0eb4f73
Compare
0eb4f73
to
590909b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in my deployment. Good to go
84dcfea
to
0674531
Compare
).contains('UnauthorizedOperation', 'PROFILE_DATASET_TABLE', dataset_uri) | ||
|
||
|
||
def test_list_table_profiling_runs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could mark them as @pytest.mark.skip(reason="no way of currently testing this")
to avoid polluting the reports
### Feature or Bugfix <!-- please choose --> - Feature ### Detail - Adding integration tests for Dataset Table Data Filters - PENDING TESTS PASSING IN DEV AWS ENV - Merge after #1391 ### Relates - related to #1220 and #1358 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <dlpzx@amazon.com>
#1533) ### Feature or Bugfix - Feature: Tests TO BE MERGED AFTER #1391 ### Detail Follow-up of #1391. This PR adds: - Tests for profiling jobs - because it is an easy submodule I decided to "chain" the tests and make them one dependent on the next one. I could also create a fixture for a profiling job (check warning, profiling jobs cannot be deleted) - Added missing tests in datasets_base - we still need to add redshift datasets and other types of datasets every time there is a new dataset added. - Added missing tests in s3_datasets: test_list_s3_datasets_owned_by_env_group.⚠️ Issues discovered during testing. They are not bugs, they are missing functionalities: - Profiling jobs can never be deleted. It is just information on the RDS database, but nevertheless it cannot be deleted. - It would be nice to have an API that checks the status of a Glue crawler ### Relates - #1358 - #1391 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <noahpaig@amazon.com>
### Feature or Bugfix - Feature: Testing ### Detail Follow-up of #1391 - Implement Table Column tests ### Relates - #1358 - #1391 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <noahpaig@amazon.com>
Feature or Bugfix
Detail
In this PR we add new fixtures for S3 datasets that are used for the tests in S3/tables/folders but also for the tests developed in ##1389:
This PR moves dataset_base testing scenarios to datasets_base/test_dataset.py. Testing scenarios have been defined for the S3 datasets and the remaining test scenarios for the datasets_base APIs are defined with their signature and a TODO comment.
It also splits the S3 dataset tests into their corresponding API subcategories (in
backend/.../s3_datasets/api
)Implement testing scenarios for
test_s3_folders
covering all APIs and dataset types (parametrized tests). Note that to avoid duplication of tests, unauthorized test cases are tested with only one of the dataset types as the code executed is the same for all cases.Implement testing scenarios for
test_s3_tables
covering all APIs and dataset types (parametrized tests). Same as folders, unauthorized tests are performed on a single dataset type. New tests include: sync_tables with real tables, preview tables with real tables, preview unauthorized depending on the confidentiality level, get_dataset_level, list_dataset_tablesFor
test_s3_datasets
only test_create_dataset_unauthorized is added, but for other existing tests we add test for all dataset types (parametrized tests).Next steps
In follow-up PRs we should implement the missing commented TODO tests for:
Relates
Security
Please answer the questions below briefly where applicable, or write
N/A
. Based onOWASP 10.
fetching data from storage outside the application (e.g. a database, an S3 bucket)?
eval
or similar functions are used?By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.