-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dataset integration tests - Dataset CRUD + actions outside of data.all #1379
Conversation
9dafa90
to
54d5ff8
Compare
…+ aws clients for dataset
54d5ff8
to
cd27097
Compare
91a6afe
to
fa69dde
Compare
Testing locally
Testing in AWS
|
430f5b8
to
5ea8b6b
Compare
83f3a07
to
520a34e
Compare
Tested in AWS and all checks are passing |
…egration-tests-datasets # Conflicts: # tests_new/integration_tests/core/environment/global_conftest.py
tests_new/integration_tests/modules/s3_datasets/test_s3_dataset.py
Outdated
Show resolved
Hide resolved
tests_new/integration_tests/modules/s3_datasets/global_conftest.py
Outdated
Show resolved
Hide resolved
# TODO: Come up with better way to handle wait in progress if applicable | ||
# Use time.sleep() instead of poller b/c of case where no changes founds (i.e. no update required) | ||
# check_stack_in_progress(client1, env_uri, stack_uri) | ||
time.sleep(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 10 seconds are not enough to make sure that cdkproxy issues a cdk deploy. It takes about 30 seconds for the fargate to boot the container, another 30 seconds to execute the cdk deploy
and then probably a few more seconds for the cdk deploy
to return `no diffs.
The best solution would be to make updateStack return a taskId and then expose the apis (that we already have in the backend) to track the status of this taskId.
But a good enough solution would be the following...
updateStack
/getStack
return a list of events, when we issue the updateStack
we can get the latest event, then we can poll with getStack until we get a new event, at that point we will know for sure that the cdk deploy
have started executing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing this I realised that if there are no changes in the stack then maybe cdk deploy
doesn't generate a new CFN event, so we need to double check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can disregard the above comments, when the are no changes
in the stack then cdk deploy
doesn't generate a new events. As an alternative we can use the getStackLogs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being I increased the sleep time to give ECS/CFN time to update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd only suggest to increase the timeout to 2 minutes as privisioning+running of cdkproxy takes about 1 minute
### Feature or Bugfix - Feature: Testing ### Detail In this PR we add new fixtures for S3 datasets that are used for the tests in S3/tables/folders but also for the tests developed in ##1389: - Fix imported KMS dataset - there was an error in the KMS keys and in the registration of the Glue database - Folders as separate fixture using create_folder data.all API - Tables as separate fixture using boto3 calls to create the table, upload data and then use sync_tables data.all API - the data can be queried! This PR moves dataset_base testing scenarios to datasets_base/test_dataset.py. Testing scenarios have been defined for the S3 datasets and the remaining test scenarios for the datasets_base APIs are defined with their signature and a TODO comment. It also splits the S3 dataset tests into their corresponding API subcategories (in `backend/.../s3_datasets/api`) - test_s3_datasets - test_s3_tables - test_s3_tables_profiling - test_s3_tables_columns - test_s3_folders Implement testing scenarios for `test_s3_folders` covering all APIs and dataset types (parametrized tests). Note that to avoid duplication of tests, unauthorized test cases are tested with only one of the dataset types as the code executed is the same for all cases. Implement testing scenarios for `test_s3_tables` covering all APIs and dataset types (parametrized tests). Same as folders, unauthorized tests are performed on a single dataset type. New tests include: sync_tables with real tables, preview tables with real tables, preview unauthorized depending on the confidentiality level, get_dataset_level, list_dataset_tables For `test_s3_datasets` only test_create_dataset_unauthorized is added, but for other existing tests we add test for all dataset types (parametrized tests). #### Next steps In follow-up PRs we should implement the missing commented TODO tests for: - datasets_base ---> list owned tests - s3_datasets ---> list owned tests - s3_tables ---> data filters tests - s3_tables_profiling ---> some tests - s3_tables_columns ---> all tests - Review backwards compatibility tests and add table and folder test cases ### Relates - #1379 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <dlpzx@amazon.com>
Feature or Bugfix
Detail
It implements some tests for s3_datasets (check full list in #1358)
For fresh deployments
🔦 AWS actions outside of data.all
There are some actions that in real life are performed outside of data.all. To run the tests we need to either perform this actions manually before the tests are executed or we can use AWS SDK to automate them. Most important actions performed outside of data.all.
To create resources we need to assume a role in the environment account. We could assume the pivot role, but then we need to ensure that it has CreateBucket... permissions; which is not the case. I have opted to create a separate isolated role
dataall-integration-tests-role
as part of the environment stack ONLY when we are creating environments during integration testing. As part of the global config of environments users can use the boto3 session of this role to perform direct AWS calls in the environment account.In https://github.com/data-dot-all/dataall/pull/1382/files we discussed some alternatives. In this PR we use the
environmentType
variable in the environment model, which was not used for anything (it always defaulted to Data environments).API call create environment (input: environmentType = IntegrationTesting) ---> in environment stack we check the type of environment and deploy the integration test role.
Then we use an SSM parameter to read the tooling account id needed for the assume role trust policy
Relates
Security
Please answer the questions below briefly where applicable, or write
N/A
. Based onOWASP 10.
fetching data from storage outside the application (e.g. a database, an S3 bucket)?
eval
or similar functions are used?By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.